NBIS ID: SMS_6198
Report Version: 1.0
Request by: Mona N. Högberg ()
Principal Investigator: Mona N. Högberg ()
Organisation: SLU
NBIS Staff: Juliana Assis ()


1 Setup

## LIBRARIES
library("dada2")
library("devtools")
library("dplyr")
library("ggplot2")
library("microbiome")
library("phangorn") 
library("phyloseq") 
library("Rcpp")
library("reshape2")
library("tidyr")
library("vegan")
library("ShortRead")
library("Biostrings")
library("DECIPHER")
library("SensusR")
library("gplots")
library("gridExtra")
library("grid")
library("ggpubr")
library("reshape2")
library("reshape")
library("lulu")
library("ggrepel")
library("ggh4x")
library("RColorBrewer")
library("rITSx")
library("MicEco")
library("agricolae")

2 Version

1.0

  • Support Request
    Request sent by the user: Mona N. Högberg

3 Data

96 samples

  • Type of data

ITS amplicon

  • Data location rackham.uppmax.uu.se /proj/snic2022-22-352

  • Uppmax project ID SNIC 2022/22-352

  • NGI Project ID P9723

  • Database used Unite

4 Tools

NFCore-Ampliseq (Dada2)

#Sample Info
head(sample_info_tab)

5 Workflow

5.1 Replicates Evaluation

Alpha Diversity Plot comparing the Replicates A and B.

Quality control analysis using matched samples from 3 different Transects: A, B and C of the experiment and replicates samples on each Ecotype. Comparison of alpha diversity in technical replicates samples on all Ecotypes from each Transect. ASV richness and ASV Shannon diversity.

  • Beta Diversity Plot comparing the Replicates A and B.

* Beta Diversity Plot comparing the Replicates A and B.

Rarefaction

5.2 Main Analysis

  • Alpha Diversity Plot - A.

5.3 raw alpha-diversity, i.e. without rarefying

  • comparing alpha diversity based on raw data has the huge problem of ignoring sample size differences. Usually there is a trend that more reads correlate with higher diversity.
  • Here, therefore I checked, whether there are significant sample size differences between the groups.
  • We further look at the residuals of a linear fit alpha diversity vs sample size to correct for this confounder.

5.3.1 Check whether sample_sums/sample sizes/library sizes differ between groups

  • sample size adjustment has no influence on richness, so sample_sizes are compared for raw counts

Alpha Diversity = BoxPlot

#https://grunwaldlab.github.io/analysis_of_microbiome_community_data_in_r/07--diversity_stats.html
anova_result <- aov(df2$Observed ~ Transect * Ecotype, data = df2)
summary(aov(df2$Observed ~ Transect * Ecotype, data = df2))
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## Transect          2   1715     858   0.733 0.488718    
## Ecotype           4 151580   37895  32.400  1.7e-10 ***
## Transect:Ecotype  8  51828    6478   5.539 0.000243 ***
## Residuals        30  35087    1170                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aresult <- HSD.test(anova_result, "Ecotype", group = TRUE)

Plot by mean

PY4 <- levels(tt$Ecotype) # get the variables
PY.pairs4 <- combn(seq_along(PY4), 2, simplify = FALSE, FUN = function(i)PY4[i])

##
options(ggrepel.max.overlaps = Inf)
p4 <- tt %>% 
  ggplot(aes(x = Ecotype, y = Observed, color = Ecotype)) + #,shape = Type
  geom_point(size = 5, color = "grey") +
  labs(x = "", y = "") +
  theme_pubr(border = TRUE) +
  scale_colour_manual(values = cols_Ecotype) +
  theme(legend.position="top") +
  theme(legend.title = element_blank()) +
  facet_nested(~Transect) +
  stat_compare_means( method='wilcox.test', p.adjust.method = "BH", label = "p.signif", comparisons = PY.pairs4, size = 2) 

p4 + 
  geom_point(data=p4$data, aes(x = Ecotype, y = Observed, fill =Ecotype, size = 4))  +
  scale_fill_manual(values = cols_Ecotype) 

  • Beta Diversity Plot - A.
gpca  <- ordinate(Plots, "MDS")
# Scree plot
plot_scree(gpca, "Scree Plot for MDS Analysis")

#Remove ASVs that do not show appear more than 5 times in more than half the samples
genefilter = genefilter_sample(P_Plots, filterfun_sample(function(x) x > 5), A=0.5*nsamples(P_Plots))
genefilter_tax = prune_taxa(genefilter, P_Plots)

#Transform to even sampling depth.
genefilter_tax_t = transform_sample_counts(genefilter_tax, function(x) 1E6 * x/sum(x))

genefilter_tax_t.ord <- ordinate(genefilter_tax_t, "NMDS")
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1177618 
## Run 1 stress 0.1177617 
## ... New best solution
## ... Procrustes: rmse 0.0003454533  max resid 0.002010498 
## ... Similar to previous best
## Run 2 stress 0.1161722 
## ... New best solution
## ... Procrustes: rmse 0.04925975  max resid 0.2692136 
## Run 3 stress 0.1363471 
## Run 4 stress 0.1161722 
## ... New best solution
## ... Procrustes: rmse 2.537841e-05  max resid 9.047779e-05 
## ... Similar to previous best
## Run 5 stress 0.1384023 
## Run 6 stress 0.1339011 
## Run 7 stress 0.1344903 
## Run 8 stress 0.1474127 
## Run 9 stress 0.1345524 
## Run 10 stress 0.1177619 
## Run 11 stress 0.1161722 
## ... New best solution
## ... Procrustes: rmse 3.784609e-06  max resid 1.69716e-05 
## ... Similar to previous best
## Run 12 stress 0.1500831 
## Run 13 stress 0.1177619 
## Run 14 stress 0.1161722 
## ... New best solution
## ... Procrustes: rmse 1.210068e-05  max resid 6.047277e-05 
## ... Similar to previous best
## Run 15 stress 0.1151886 
## ... New best solution
## ... Procrustes: rmse 0.01126614  max resid 0.06410793 
## Run 16 stress 0.136347 
## Run 17 stress 0.1350689 
## Run 18 stress 0.1151887 
## ... Procrustes: rmse 5.830998e-05  max resid 0.0002856496 
## ... Similar to previous best
## Run 19 stress 0.1177616 
## Run 20 stress 0.1151886 
## ... Procrustes: rmse 1.47822e-05  max resid 8.000913e-05 
## ... Similar to previous best
## *** Solution reached
#
genefilter_tax_t@sam_data$Transect <- factor(genefilter_tax_t@sam_data$Transect, levels = c(1,2,3), labels = c("1","2", "3"))

genefilter_tax_t@sam_data$Ecotype <- factor(genefilter_tax_t@sam_data$Ecotype, levels = c("Meadow","Alder zone","Spruce-Alder", "Spruce","Pine"), labels = c("Meadow","Alder","Spruce-Alder", "Spruce","Pine"))

#
ord_DataFrame_tx <- plot_ordination(genefilter_tax_t, genefilter_tax_t.ord, "samples", justDF ="TRUE")

#
ordplot_tx <- (ordplot <- plot_ordination(genefilter_tax_t, genefilter_tax_t.ord, "samples", color="Ecotype", shape = "Transect") + 
               geom_hline(yintercept = 0, linetype = 'dashed', alpha = 0.3) + 
               geom_vline(xintercept = 0, linetype = 'dashed', alpha = 0.3) +
               geom_point(size = 5, color = "grey") +
               scale_shape_manual(values=c(23,21,24)) +
               theme_pubr(border = TRUE) +
               coord_fixed(ratio = 1) +
               theme(axis.text=element_text(size=14), 
                     axis.text.x = element_text(size = 12, hjust = 0.5), 
                     axis.title.y = element_text(size = 18),
                     legend.text=element_text(size=14), 
                     legend.title=element_text(size=0),
                     legend.position="bottom",
                     axis.title.x = element_text(size = 18), 
                     strip.text.x = element_text(size = 20, face = "bold"))) +
  scale_color_manual(values = cols_Ecotype)#+
ordplot_tx$layers <- ordplot_tx$layers[-1]
ordplot_tx + 
  geom_point(data=ord_DataFrame_tx, aes(x = NMDS1, y = NMDS2, fill =Ecotype, size = 4))  +
  scale_fill_manual(values = cols_Ecotype) 

pord2 = (ordplot <- plot_ordination(genefilter_tax_t, genefilter_tax_t.ord, "samples", color="Ecotype", shape = "Transect") + 
               geom_hline(yintercept = 0, linetype = 'dashed', alpha = 0.3) + 
               geom_vline(xintercept = 0, linetype = 'dashed', alpha = 0.3) +
               geom_point(size = 5, color = "grey") +
               scale_shape_manual(values=c(23,21,24)) +
               geom_polygon(aes(fill=Ecotype, alpha = 0.8)) + 
               theme_pubr(border = TRUE) +
               coord_fixed(ratio = 1) +
               theme(axis.text=element_text(size=14), 
                     axis.text.x = element_text(size = 12, hjust = 0.5), 
                     axis.title.y = element_text(size = 18),
                     legend.text=element_text(size=14), 
                     legend.title=element_text(size=0),
                     legend.position="bottom",
                     axis.title.x = element_text(size = 18), 
                     strip.text.x = element_text(size = 20, face = "bold"))) +
  scale_color_manual(values = cols_Ecotype)

pord2$layers <- pord2$layers[-1]
pord2 + 
  geom_point(data=ord_DataFrame_tx, aes(x = NMDS1, y = NMDS2, fill =Ecotype, size = 4))  +
  scale_fill_manual(values = cols_Ecotype) 

  • Beta Diversity Plot - B.

Ecotype

  • Permutational analysis of variance
  • Relative abundance of the top 50 ASVs

#Heatmap with Deseq2

6 Summary

Help is needed with running the “nfcore/ampliseq” pipeline developed at NGI for the analyses of fungal ITS1 amplicons, Illumina Miseq analysis NGI project ID P5953 (M.Hogberg_17_01_project summary from 2018-01-19 by Chuan Wang refers to P9723, >=Q30 (mean(SD), 70(2) (%), Sum Reads=15 650 000, Mean reads per sample 171 711, 1 pool of amplicons, 1 flowcell v3, PE 2x301bp (validated method), demultiplexing, quality control and raw data delivery on Uppmax (validated method). Agreement number M.Hogberg_16_01-20160826. Grus delivery project delivery 00654. Because my support application 2019-01-17 was rejected, I have collaborated with partners in US on this matter but all is extremely delayed for known reasons. I got some hope today when I read about the recently developed pipeline for fungal analyses! Unfortunately, I have no programming skills but have a BSc in Molecular Biology.

Short summary of the work.

7 Further Work

Further steps to be taken (if needed).

8 References

Relevant references for methods, tools etc.

9 Deliverables

Files delivered to the user with descriptions.

9.1 Directory

/data/processed/b/

8 directories, 18 files

Total size is XX GB.

10 Timeline

11 Practical Info

11.1 Data responsibility

The responsibility for data archiving lies with the PI of the project. We do not offer long-term storage or retrieval of data.

  • NBIS & Uppnex: We kindly ask that you remove the files from UPPMAX/UPPNEX. The main storage at UPPNEX is optimized for high-speed and parallel access, which makes it expensive and not the right place for longer time archiving. Please consider others by not taking up the expensive space. Please note that UPPMAX is a resource separate from the Bioinformatics Platform, administered by the Swedish National Infrastructure for Computing (SNIC) and SNIC-specifc project rules apply to all projects hosted at UPPMAX.
  • Sensitive data : Please note that special considerations may apply to the human-derived legally considered sensitive personal data. These should be handled according to specific laws and regulations as outlined e.g. here.
  • Long-term backup : We recommend asking your local IT for support with long-term data archiving. Also a newly established Data Office at SciLifeLab may be of help to discuss other options.

11.2 Acknowledgments

If you are presenting the results in a paper, at a workshop or conference, we kindly ask you to acknowledge us.

  • NBIS staff are encouraged to be co-authors when this is merited in accordance to the ethical recommendations for authorship, e.g. ICMJE recommendations. If applicable, please include Juliana, Assis Geraldo, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, NBIS, as co-author. In other cases, NBIS would be grateful if support by us is acknowledged in publications according to this example:

“Support by NBIS (National Bioinformatics Infrastructure Sweden) is gratefully acknowledged.”

“The computations were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Project SNIC 2022-22-352.”

  • NGI : For publications based on data from NGI Sweden, NGI, SciLifeLab and UPPMAX should be acknowledged like so:

“The authors would like to acknowledge support from Science for Life Laboratory (SciLifeLab), the National Genomics Infrastructure (NGI), and Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) for providing assistance in massive parallel sequencing and computational infrastructure.”

12 Support Completion

You should soon be contacted by one of our managers with a request to close down the project in our internal system and for invoicing matters. If we do not hear from you within 30 days the project will be automatically closed and invoice sent. Again, we would like to remind you about data responsibility and acknowledgements, see sections: Data Responsibility and Acknowledgments.

You are welcome to come back to us with further data analysis request at any time via http://nbis.se/support/support.html.

Thank you for using NBIS.